IEICE global.ieice.org Site

Keyword Search Result

[Keyword] neural network(855hit)

161-180hit(855hit)

Estimation of Switching Loss and Voltage Overshoot of Active Gate Driver by Neural Network
Satomu YASUDA Yukihisa SUZUKI Keiji WADA

BRIEF PAPER

Pubricized:
2020/05/01
Vol:
E103-C No:11
Page(s):
609-612
An active gate driver IC generates arbitrary switching waveform is proposed to reduce the switching loss, the voltage overshoot, and the electromagnetic interference (EMI) by optimizing the switching pattern. However, it is hard to find optimal switching pattern because the switching pattern has huge possible combinations. In this paper, the method to estimate the switching loss and the voltage overshoot from the switching pattern with neural network (NN) is proposed. The implemented NN model obtains reasonable learning results for data-sets.
Co-Design of Binary Processing in Memory ReRAM Array and DNN Model Optimization Algorithm
Yue GUAN Takashi OHSAWA

PAPER-Integrated Electronics

Pubricized:
2020/05/13
Vol:
E103-C No:11
Page(s):
685-692
In recent years, deep neural network (DNN) has achieved considerable results on many artificial intelligence tasks, e.g. natural language processing. However, the computation complexity of DNN is extremely high. Furthermore, the performance of traditional von Neumann computing architecture has been slowing down due to the memory wall problem. Processing in memory (PIM), which places computation within memory and reduces the data movement, breaks the memory wall. ReRAM PIM is thought to be a available architecture for DNN accelerators. In this work, a novel design of ReRAM neuromorphic system is proposed to process DNN fully in array efficiently. The binary ReRAM array is composed of 2T2R storage cells and current mirror sense amplifiers. A dummy BL reference scheme is proposed for reference voltage generation. A binary DNN (BDNN) model is then constructed and optimized on MNIST dataset. The model reaches a validation accuracy of 96.33% and is deployed to the ReRAM PIM system. Co-design model optimization method between hardware device and software algorithm is proposed with the idea of utilizing hardware variance information as uncertainness in optimization procedure. This method is analyzed to achieve feasible hardware design and generalizable model. Deployed with such co-design model, ReRAM array processes DNN with high robustness against fabrication fluctuation.
Construction of an Efficient Divided/Distributed Neural Network Model Using Edge Computing
Ryuta SHINGAI Yuria HIRAGA Hisakazu FUKUOKA Takamasa MITANI Takashi NAKADA Yasuhiko NAKASHIMA

PAPER-Fundamentals of Information Systems

Pubricized:
2020/07/02
Vol:
E103-D No:10
Page(s):
2072-2082
Modern deep learning has significantly improved performance and has been used in a wide variety of applications. Since the amount of computation required for the inference process of the neural network is large, it is processed not by the data acquisition location like a surveillance camera but by the server with abundant computing power installed in the data center. Edge computing is getting considerable attention to solve this problem. However, edge computing can provide limited computation resources. Therefore, we assumed a divided/distributed neural network model using both the edge device and the server. By processing part of the convolution layer on edge, the amount of communication becomes smaller than that of the sensor data. In this paper, we have evaluated AlexNet and the other eight models on the distributed environment and estimated FPS values with Wi-Fi, 3G, and 5G communication. To reduce communication costs, we also introduced the compression process before communication. This compression may degrade the object recognition accuracy. As necessary conditions, we set FPS to 30 or faster and object recognition accuracy to 69.7% or higher. This value is determined based on that of an approximation model that binarizes the activation of Neural Network. We constructed performance and energy models to find the optimal configuration that consumes minimum energy while satisfying the necessary conditions. Through the comprehensive evaluation, we found that the optimal configurations of all nine models. For small models, such as AlexNet, processing entire models in the edge was the best. On the other hand, for huge models, such as VGG16, processing entire models in the server was the best. For medium-size models, the distributed models were good candidates. We confirmed that our model found the most energy efficient configuration while satisfying FPS and accuracy requirements, and the distributed models successfully reduced the energy consumption up to 48.6%, and 6.6% on average. We also found that HEVC compression is important before transferring the input data or the feature data between the distributed inference processes.
Weight Compression MAC Accelerator for Effective Inference of Deep Learning Open Access
Asuka MAKI Daisuke MIYASHITA Shinichi SASAKI Kengo NAKATA Fumihiko TACHIBANA Tomoya SUZUKI Jun DEGUCHI Ryuichi FUJIMOTO

PAPER-Integrated Electronics

Pubricized:
2020/05/15
Vol:
E103-C No:10
Page(s):
514-523
Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
Sentence-Embedding and Similarity via Hybrid Bidirectional-LSTM and CNN Utilizing Weighted-Pooling Attention
Degen HUANG Anil AHMED Syed Yasser ARAFAT Khawaja Iftekhar RASHID Qasim ABBAS Fuji REN

PAPER-Natural Language Processing

Pubricized:
2020/08/27
Vol:
E103-D No:10
Page(s):
2216-2227
Neural networks have received considerable attention in sentence similarity measuring systems due to their efficiency in dealing with semantic composition. However, existing neural network methods are not sufficiently effective in capturing the most significant semantic information buried in an input. To address this problem, a novel weighted-pooling attention layer is proposed to retain the most remarkable attention vector. It has already been established that long short-term memory and a convolution neural network have a strong ability to accumulate enriched patterns of whole sentence semantic representation. First, a sentence representation is generated by employing a siamese structure based on bidirectional long short-term memory and a convolutional neural network. Subsequently, a weighted-pooling attention layer is applied to obtain an attention vector. Finally, the attention vector pair information is leveraged to calculate the score of sentence similarity. An amalgamation of both, bidirectional long short-term memory and a convolutional neural network has resulted in a model that enhances information extracting and learning capacity. Investigations show that the proposed method outperforms the state-of-the-art approaches to datasets for two tasks, namely semantic relatedness and Microsoft research paraphrase identification. The new model improves the learning capability and also boosts the similarity accuracy as well.
Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-Occurrence
Keisuke IMOTO Seisuke KYOCHI

PAPER-Speech and Hearing

Pubricized:
2020/06/08
Vol:
E103-D No:9
Page(s):
1971-1977
A limited number of types of sound event occur in an acoustic scene and some sound events tend to co-occur in the scene; for example, the sound events “dishes” and “glass jingling” are likely to co-occur in the acoustic scene “cooking.” In this paper, we propose a method of sound event detection using graph Laplacian regularization with sound event co-occurrence taken into account. In the proposed method, the occurrences of sound events are expressed as a graph whose nodes indicate the frequencies of event occurrence and whose edges indicate the sound event co-occurrences. This graph representation is then utilized for the model training of sound event detection, which is optimized under an objective function with a regularization term considering the graph structure of sound event occurrence and co-occurrence. Evaluation experiments using the TUT Sound Events 2016 and 2017 detasets, and the TUT Acoustic Scenes 2016 dataset show that the proposed method improves the performance of sound event detection by 7.9 percentage points compared with the conventional CNN-BiGRU-based detection method in terms of the segment-based F1 score. In particular, the experimental results indicate that the proposed method enables the detection of co-occurring sound events more accurately than the conventional method.
Neural Networks Probability-Based PWL Sigmoid Function Approximation
Vantruong NGUYEN Jueping CAI Linyu WEI Jie CHU

LETTER-Biocybernetics, Neurocomputing

Pubricized:
2020/06/11
Vol:
E103-D No:9
Page(s):
2023-2026
In this letter, a piecewise linear (PWL) sigmoid function approximation based on the statistical distribution probability of the neurons' values in each layer is proposed to improve the network recognition accuracy with only addition circuit. The sigmoid function is first divided into three fixed regions, and then according to the neurons' values distribution probability, the curve in each region is segmented into sub-regions to reduce the approximation error and improve the recognition accuracy. Experiments performed on Xilinx's FPGA-XC7A200T for MNIST and CIFAR-10 datasets show that the proposed method achieves 97.45% recognition accuracy in DNN, 98.42% in CNN on MNIST and 72.22% on CIFAR-10, up to 0.84%, 0.57% and 2.01% higher than other approximation methods with only addition circuit.
Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams
Yuki SAITO Kei AKUZAWA Kentaro TACHIBANA

PAPER-Speech and Hearing

Pubricized:
2020/06/12
Vol:
E103-D No:9
Page(s):
1978-1987
This paper presents a method for many-to-one voice conversion using phonetic posteriorgrams (PPGs) based on an adversarial training of deep neural networks (DNNs). A conventional method for many-to-one VC can learn a mapping function from input acoustic features to target acoustic features through separately trained DNN-based speech recognition and synthesis models. However, 1) the differences among speakers observed in PPGs and 2) an over-smoothing effect of generated acoustic features degrade the converted speech quality. Our method performs a domain-adversarial training of the recognition model for reducing the PPG differences. In addition, it incorporates a generative adversarial network into the training of the synthesis model for alleviating the over-smoothing effect. Unlike the conventional method, ours jointly trains the recognition and synthesis models so that they are optimized for many-to-one VC. Experimental evaluation demonstrates that the proposed method significantly improves the converted speech quality compared with conventional VC methods.
Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters
JiYeoun LEE Hee-Jin CHOI

LETTER-Speech and Hearing

Pubricized:
2020/05/14
Vol:
E103-D No:8
Page(s):
1920-1923
We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.
Silent Speech Interface Using Ultrasonic Doppler Sonar
Ki-Seung LEE

PAPER-Speech and Hearing

Pubricized:
2020/05/20
Vol:
E103-D No:8
Page(s):
1875-1887
Some non-acoustic modalities have the ability to reveal certain speech attributes that can be used for synthesizing speech signals without acoustic signals. This study validated the use of ultrasonic Doppler frequency shifts caused by facial movements to implement a silent speech interface system. A 40kHz ultrasonic beam is incident to a speaker's mouth region. The features derived from the demodulated received signals were used to estimate the speech parameters. A nonlinear regression approach was employed in this estimation where the relationship between ultrasonic features and corresponding speech is represented by deep neural networks (DNN). In this study, we investigated the discrepancies between the ultrasonic signals of audible and silent speech to validate the possibility for totally silent communication. Since reference speech signals are not available in silently mouthed ultrasonic signals, a nearest-neighbor search and alignment method was proposed, wherein alignment was achieved by determining the optimal pair of ultrasonic and audible features in the sense of a minimum mean square error criterion. The experimental results showed that the performance of the ultrasonic Doppler-based method was superior to that of EMG-based speech estimation, and was comparable to an image-based method.
Improvement of Luminance Isotropy for Convolutional Neural Networks-Based Image Super-Resolution
Kazuya URAZOE Nobutaka KUROKI Yu KATO Shinya OHTANI Tetsuya HIROSE Masahiro NUMA

LETTER-Image

Vol:
E103-A No:7
Page(s):
955-958
Convolutional neural network (CNN)-based image super-resolutions are widely used as a high-quality image-enhancement technique. However, in general, they show little to no luminance isotropy. Thus, we propose two methods, “Luminance Inversion Training (LIT)” and “Luminance Inversion Averaging (LIA),” to improve the luminance isotropy of CNN-based image super-resolutions. Experimental results of 2× image magnification show that the average peak signal-to-noise ratio (PSNR) using Luminance Inversion Averaging is about 0.15-0.20dB higher than that for the conventional super-resolution.
Heatmapping of Group People Involved in the Group Activity
Kohei SENDO Norimichi UKITA

PAPER

Pubricized:
2020/03/18
Vol:
E103-D No:6
Page(s):
1209-1216
This paper proposes a method for heatmapping people who are involved in a group activity. Such people grouping is useful for understanding group activities. In prior work, people grouping is performed based on simple inflexible rules and schemes (e.g., based on proximity among people and with models representing only a constant number of people). In addition, several previous grouping methods require the results of action recognition for individual people, which may include erroneous results. On the other hand, our proposed heatmapping method can group any number of people who dynamically change their deployment. Our method can work independently of individual action recognition. A deep network for our proposed method consists of two input streams (i.e., RGB and human bounding-box images). This network outputs a heatmap representing pixelwise confidence values of the people grouping. Extensive exploration of appropriate parameters was conducted in order to optimize the input bounding-box images. As a result, we demonstrate the effectiveness of the proposed method for heatmapping people involved in group activities.
Loss-Driven Channel Pruning of Convolutional Neural Networks
Xin LONG Xiangrong ZENG Chen CHEN Huaxin XIAO Maojun ZHANG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2020/02/17
Vol:
E103-D No:5
Page(s):
1190-1194
The increase in computation cost and storage of convolutional neural networks (CNNs) severely hinders their applications on limited-resources devices in recent years. As a result, there is impending necessity to accelerate the networks by certain methods. In this paper, we propose a loss-driven method to prune redundant channels of CNNs. It identifies unimportant channels by using Taylor expansion technique regarding to scaling and shifting factors, and prunes those channels by fixed percentile threshold. By doing so, we obtain a compact network with less parameters and FLOPs consumption. In experimental section, we evaluate the proposed method in CIFAR datasets with several popular networks, including VGG-19, DenseNet-40 and ResNet-164, and experimental results demonstrate the proposed method is able to prune over 70% channels and parameters with no performance loss. Moreover, iterative pruning could be used to obtain more compact network.
Vehicle Key Information Detection Algorithm Based on Improved SSD
Ende WANG Yong LI Yuebin WANG Peng WANG Jinlei JIAO Xiaosheng YU

PAPER-Intelligent Transport System

Vol:
E103-A No:5
Page(s):
769-779
With the rapid development of technology and economy, the number of cars is increasing rapidly, which brings a series of traffic problems. To solve these traffic problems, the development of intelligent transportation systems are accelerated in many cities. While vehicles and their detailed information detection are great significance to the development of urban intelligent transportation system, the traditional vehicle detection algorithm is not satisfactory in the case of complex environment and high real-time requirement. The vehicle detection algorithm based on motion information is unable to detect the stationary vehicles in video. At present, the application of deep learning method in the task of target detection effectively improves the existing problems in traditional algorithms. However, there are few dataset for vehicles detailed information, i.e. driver, car inspection sign, copilot, plate and vehicle object, which are key information for intelligent transportation. This paper constructs a deep learning dataset containing 10,000 representative images about vehicles and their key information detection. Then, the SSD (Single Shot MultiBox Detector) target detection algorithm is improved and the improved algorithm is applied to the video surveillance system. The detection accuracy of small targets is improved by adding deconvolution modules to the detection network. The experimental results show that the proposed method can detect the vehicle, driver, car inspection sign, copilot and plate, which are vehicle key information, at the same time, and the improved algorithm in this paper has achieved better results in the accuracy and real-time performance of video surveillance than the SSD algorithm.
Patient-Specific ECG Classification with Integrated Long Short-Term Memory and Convolutional Neural Networks
Jiaquan WU Feiteng LI Zhijian CHEN Xiaoyan XIANG Yu PU

PAPER-Biological Engineering

Pubricized:
2020/02/13
Vol:
E103-D No:5
Page(s):
1153-1163
This paper presents an automated patient-specific ECG classification algorithm, which integrates long short-term memory (LSTM) and convolutional neural networks (CNN). While LSTM extracts the temporal features, such as the heart rate variance (HRV) and beat-to-beat correlation from sequential heartbeats, CNN captures detailed morphological characteristics of the current heartbeat. To further improve the classification performance, adaptive segmentation and re-sampling are applied to align the heartbeats of different patients with various heart rates. In addition, a novel clustering method is proposed to identify the most representative patterns from the common training data. Evaluated on the MIT-BIH arrhythmia database, our algorithm shows the superior accuracy for both ventricular ectopic beats (VEB) and supraventricular ectopic beats (SVEB) recognition. In particular, the sensitivity and positive predictive rate for SVEB increase by more than 8.2% and 8.8%, respectively, compared with the prior works. Since our patient-specific classification does not require manual feature extraction, it is potentially applicable to embedded devices for automatic and accurate arrhythmia monitoring.
A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM
Yibo FAN Leilei HUANG Kewei CHEN Xiaoyang ZENG

PAPER-Integrated Electronics

Pubricized:
2019/11/27
Vol:
E103-C No:5
Page(s):
263-273
The neural network has been one of the most useful techniques in the area of speech recognition, language translation and image analysis in recent years. Long Short-Term Memory (LSTM), a popular type of recurrent neural networks (RNNs), has been widely implemented on CPUs and GPUs. However, those software implementations offer a poor parallelism while the existing hardware implementations lack in configurability. In order to make up for this gap, a highly configurable 7.62 GOP/s hardware implementation for LSTM is proposed in this paper. To achieve the goal, the work flow is carefully arranged to make the design compact and high-throughput; the structure is carefully organized to make the design configurable; the data buffering and compression strategy is carefully chosen to lower the bandwidth without increasing the complexity of structure; the data type, logistic sigmoid (σ) function and hyperbolic tangent (tanh) function is carefully optimized to balance the hardware cost and accuracy. This work achieves a performance of 7.62 GOP/s @ 238 MHz on XCZU6EG FPGA, which takes only 3K look-up table (LUT). Compared with the implementation on Intel Xeon E5-2620 CPU @ 2.10GHz, this work achieves about 90× speedup for small networks and 25× speed-up for large ones. The consumption of resources is also much less than that of the state-of-the-art works.
Gradient-Enhanced Softmax for Face Recognition
Linjun SUN Weijun LI Xin NING Liping ZHANG Xiaoli DONG Wei HE

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2020/02/07
Vol:
E103-D No:5
Page(s):
1185-1189
This letter proposes a gradient-enhanced softmax supervisor for face recognition (FR) based on a deep convolutional neural network (DCNN). The proposed supervisor conducts the constant-normalized cosine to obtain the score for each class using a combination of the intra-class score and the soft maximum of the inter-class scores as the objective function. This mitigates the vanishing gradient problem in the conventional softmax classifier. The experiments on the public Labeled Faces in the Wild (LFW) database denote that the proposed supervisor achieves better results when compared with those achieved using the current state-of-the-art softmax-based approaches for FR.
A Deep Neural Network-Based Approach to Finding Similar Code Segments
Dong Kwan KIM

LETTER-Software Engineering

Pubricized:
2020/01/17
Vol:
E103-D No:4
Page(s):
874-878
This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.
Robust CAPTCHA Image Generation Enhanced with Adversarial Example Methods
Hyun KWON Hyunsoo YOON Ki-Woong PARK

LETTER-Information Network

Pubricized:
2020/01/15
Vol:
E103-D No:4
Page(s):
879-882
Malicious attackers on the Internet use automated attack programs to disrupt the use of services via mass spamming, unnecessary bulletin boarding, and account creation. Completely automated public turing test to tell computers and humans apart (CAPTCHA) is used as a security solution to prevent such automated attacks. CAPTCHA is a system that determines whether the user is a machine or a person by providing distorted letters, voices, and images that only humans can understand. However, new attack techniques such as optical character recognition (OCR) and deep neural networks (DNN) have been used to bypass CAPTCHA. In this paper, we propose a method to generate CAPTCHA images by using the fast-gradient sign method (FGSM), iterative FGSM (I-FGSM), and the DeepFool method. We used the CAPTCHA image provided by python as the dataset and Tensorflow as the machine learning library. The experimental results show that the CAPTCHA image generated via FGSM, I-FGSM, and DeepFool methods exhibits a 0% recognition rate with ε=0.15 for FGSM, a 0% recognition rate with α=0.1 with 50 iterations for I-FGSM, and a 45% recognition rate with 150 iterations for the DeepFool method.
Software Development Effort Estimation from Unstructured Software Project Description by Sequence Models
Tachanun KANGWANTRAKOOL Kobkrit VIRIYAYUDHAKORN Thanaruk THEERAMUNKONG

PAPER

Pubricized:
2020/01/14
Vol:
E103-D No:4
Page(s):
739-747
Most existing methods of effort estimations in software development are manual, labor-intensive and subjective, resulting in overestimation with bidding fail, and underestimation with money loss. This paper investigates effectiveness of sequence models on estimating development effort, in the form of man-months, from software project data. Four architectures; (1) Average word-vector with Multi-layer Perceptron (MLP), (2) Average word-vector with Support Vector Regression (SVR), (3) Gated Recurrent Unit (GRU) sequence model, and (4) Long short-term memory (LSTM) sequence model are compared in terms of man-months difference. The approach is evaluated using two datasets; ISEM (1,573 English software project descriptions) and ISBSG (9,100 software projects data), where the former is a raw text and the latter is a structured data table explained the characteristic of a software project. The LSTM sequence model achieves the lowest and the second lowest mean absolute errors, which are 0.705 and 14.077 man-months for ISEM and ISBSG datasets respectively. The MLP model achieves the lowest mean absolute errors which is 14.069 for ISBSG datasets.

161-180hit(855hit)

Keyword Search Result

[Keyword] neural network(855hit)

Estimation of Switching Loss and Voltage Overshoot of Active Gate Driver by Neural Network

Co-Design of Binary Processing in Memory ReRAM Array and DNN Model Optimization Algorithm

Construction of an Efficient Divided/Distributed Neural Network Model Using Edge Computing

Weight Compression MAC Accelerator for Effective Inference of Deep Learning Open Access

Sentence-Embedding and Similarity via Hybrid Bidirectional-LSTM and CNN Utilizing Weighted-Pooling Attention

Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-Occurrence

Neural Networks Probability-Based PWL Sigmoid Function Approximation

Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams

Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters

Silent Speech Interface Using Ultrasonic Doppler Sonar

Improvement of Luminance Isotropy for Convolutional Neural Networks-Based Image Super-Resolution

Heatmapping of Group People Involved in the Group Activity

Loss-Driven Channel Pruning of Convolutional Neural Networks

Vehicle Key Information Detection Algorithm Based on Improved SSD

Patient-Specific ECG Classification with Integrated Long Short-Term Memory and Convolutional Neural Networks

A Highly Configurable 7.62GOP/s Hardware Implementation for LSTM

Gradient-Enhanced Softmax for Face Recognition

A Deep Neural Network-Based Approach to Finding Similar Code Segments

Robust CAPTCHA Image Generation Enhanced with Adversarial Example Methods

Software Development Effort Estimation from Unstructured Software Project Description by Sequence Models

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles